Bias Correction and Confidence Intervals for Fitted Q-iteration

نویسندگان

  • Bibhas Chakraborty
  • Victor Strecher
  • Susan Murphy
چکیده

We consider finite-horizon fitted Q-iteration with linear function approximation to learn a policy from a training set of trajectories. We show that fitted Q-iteration can give biased estimates and invalid confidence intervals for the parameters that feature in the policy. We propose a regularized estimator called soft-threshold estimator, derive it as an approximate empirical Bayes estimator, and show that it reduces bias and improves the coverage rates of confidence intervals via simulated experiments. We also demonstrate the use of this method in the analysis of data from a randomized smoking cessation study.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CFQI: Fitted Q-Iteration with Complex Returns

Fitted Q-Iteration (FQI) is a popular approximate value iteration (AVI) approach that makes effective use of off-policy data. FQI uses a 1-step return value update which does not exploit the sequential nature of trajectory data. Complex returns (weighted averages of the n-step returns) use trajectory data more effectively, but have not been used in an AVI context because of off-policy bias. In ...

متن کامل

Uncertainty quantification in unfolding elementary particle spectra at the Large Hadron Collider

This thesis studies statistical inference in the high energy physics unfolding problem, which is an ill-posed inverse problem arising in data analysis at the Large Hadron Collider (LHC) at CERN. Any measurement made at the LHC is smeared by the finite resolution of the particle detectors and the goal in unfolding is to use these smeared measurements to make nonparametric inferences about the un...

متن کامل

A Bayesian approach to type-specific conic fitting

A perturbative approach is used to quantify the effect of noise in data points on fitted parameters in a general homogeneous linear model, and the results applied to the case of conic sections. There is an optimal choice of normalisation that minimises bias, and iteration with the correct reweighting significantly improves statistical reliability. By conditioning on an appropriate prior, an unb...

متن کامل

Monte Carlo Comparison of Approximate Tolerance Intervals for the Poisson Distribution

The problem of finding  tolerance intervals receives very much attention of researchers and are widely used in various statistical fields, including biometry, economics, reliability analysis and quality control. Tolerance interval is a random interval  that covers a specified  proportion of the population with a specified confidence level. In this paper, we compare approximate tolerance interva...

متن کامل

On the Effect of Bias Estimation on Coverage Accuracy in Nonparametric Estimation∗

This paper studies the effect of bias correction on confidence interval estimators in the context of kernel-based nonparametric density estimation. We consider explicit plug-in bias correction but, in contrast to standard approaches, we allow the bias estimator to (potentially) have a first-order impact on the distributional approximation. This approach is meant to more accurately capture the f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008